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(54) Tide: SURROGATES FOR TARGETS AND IMPROVED REFERENCE PANELS 
(57) Abstract 

A method to determine reactivity of a candidate compound with a target which method does not require the physical presence of the 
target is disclosed. By providing a formula for treating data obtained from a reference set of target substitutes which formula is predictive 
of reactivity with the target, the compound to be tested can be physically assessed with respect to the reference panels, the formula applied, 
and reactivity with the actual target may be predicted. Panels which consist of individual members, said members comprising proteins, 
wherein at least one of the members of the panel is a protein other than an immunoglobulin (Ig) or fragment thereof and wherein the 
presence of said non-lg protein enriches the panel are described. These panels can be tested for reactivity with an analyte to create a profile. 
Such profiles can be used in pattern matching, analysis of samples and other analyses. 
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SURROGATES FOR TARGETS AND IMPROVED REFERENCE PANELS 

Technical Field 

The invention relates to identification of compounds 
that are useful in analysis, therapy and other applications 
5 where it is desirable to provide a substance which binds 

specifically to a target molecule, i.e., a specific pattern- 
matching technique which permits candidate binding substances to 
be screened in the absence of the target molecule . The 
invention also relates to an improvement in the construction of 
10 reference panels for use in profiling and pattern matching. 
Specifically, the invention concerns reference panels for the 
production of cross -reaction fingerprints which comprise enzymes 
and/or other nonimmunoglobulin proteins as affinity targets. 

Background Art 

15 There are numerous instances in which it is desirable 

to find a ligand that specifically binds a receptor or other 
target. To cite the most obvious examples, if a receptor is 
responsible for activation of a particular type of cell, ligands 
which bind the receptor may find therapeutic use . in either 

20 activating or preventing the activation of the receptor, with a 
corresponding physiological effect on the cell. If the cell is 
contained in an animal or a plant, the effect may be felt by the 
entire organism. Thus, a very popular approach to designing new 
drugs rests on finding appropriate binding agents for these 

25 receptors. 

Ligands that bind specific targets can also find 
applications in analytical contexts. For example, antibodies 
are useful components in immunoassay procedures. All of these 
procedures rely on the specific interaction between an antigen 

30 and an antibody; either partner may be the analyte. 

In addition, separation procedures and other processes 
with industrial application may take advantage of specific 
binding. To take a very straightforward illustration, an 
impurity may effectively be removed from a composition by 

35 treating the composition with a solid support to which is bound 
a "receptor" capable of binding the impurity to the relative 
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exclusion of the other components of the composition, provided 
the affinity of the receptor for the impurity is sufficiently 
greater than for the desired components. 

In all of the above cases, the amount of affinity that 
5 characterizes the specific binding and the degree of specificity 
required depends on the circumstances. Some applications are 
benefited by a relatively weak interaction, whereas others 
require a high affinity. Some applications are more demanding 
of specificity than others. 

The obvious brute force method to find a ligand that 
will bind a target of interest is physically to test the 
capability of a large number of compounds which are potential 
ligands with respect to their ability to bind the target itself. 
This method would no doubt eventually lead to finding a 
successful ligand in virtually every case but is clearly more 
time consuming and labor-intensive than would be desirable for 
practical utility. First, the target, e.g, . a receptor must be 
produced in some physical form that can be tested and sufficient 
quantities must be provided to test the range of compounds that 
are candidates. Second, if compounds are tested in just random 
order, a large quantity of target will be needed. This, 
especially in the case of cellular receptors, may be 
prohibitively expensive. 

Several approaches have been suggested to minimize 
these difficulties. First, rather than testing compounds at 
random, a systematically varied panel of compounds could be 
used. Such systematically varied panels can conveniently be 
constructed by forming polymers from monomer units of 
predetermined characteristics. The most convenient such 
polymers are peptides, but polysaccharides, polynucleotides and 
the like could also be used. The parameters that are important 
and the manner of constructing such panels are described in U.S. 
patents 4,963,263, 5,133,866 and 5,340,474 the contents of which 
are incorporated herein by reference. 

In addition to, or instead of, using systematically 
varied panels of compounds as candidates, the screening itself 
can be conducted in such a way as to minimize the number of 
physical measurements that are required. For example, as set 
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forth in U.S. patent 5,217,869, which is incorporated herein by 
reference, a reactivity profile for a ligand known to react with 
a target can be established by providing a standard panel of 
binding agents. The profile obtained characterizes this 
5 particular ligand known to bind the receptor. The candidate 
compounds can then be tested against the same panel to obtain 
their corresponding profiles . When a corresponding profile 
matches that of a ligand known to be a successful binder to the 
target, the compound which generated the matching profile will 
~"~10 have a high probability of binding the target. In an 

alternative, inverse image panels are prepared with varying 
characteristics, and profiles obtained for the receptor and 
ligand against opposite panels are matched. 

Various other technologies are directed to methods to 
15 improve the ease with which the physical binding of receptor to 
candidate ligand can be measured, such as the use of robotics, 
fluorescence detection of reactivity, physical arrangements of 
the panels, and so forth. 

Other methods which seek to find specific binding pair 
20 members include computer based methods such as three dimensional 
database searching, x-ray crystallography, molecular modeling, 
and the like. Other methods employ antibodies as surrogate 
targets or simply rely on the behavior of the compound with 
respect to related target receptors. For example, the behavior 
25 of a compound as an inhibitor of a particular serine protease, 
or of a number of serine proteases, might lead one to assume 
that it will be a useful inhibitor of an additional serine 
protease for which its inhibition activity has not yet been 
determined. The validity of this last mentioned method relies 
3 0 on the similarity of the serine proteases that are the 

"reference receptors" for which the binding characteristics of 
the test compound are known to the target receptor (serine 
protease) for which the binding characteristics are not known. 

The present invention provides another method to match 
35 ligand with a target. It is especially helpful when limited 
supplies of the target are available. The invention method is 
especially useful in drug design projects where the target has 
never been fully purified, is unstable or otherwise not 
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available in adequate quantities for large-scale screening, or 
when the assay procedure for the target is complex and costly. 
Further, the method minimizes consumption of receptor in a 
program of screening against many potential ligands. 

U.S. Patent No. 5,300,425, incorporated herein by 
reference, describes methods of preparing characteristic 
profiles of a particular analyte, matching similar profiles to 
correlate binding properties among various analytes, and the use 
of inverse image panels to create profiles for this purpose. In 
the methods described in the '425 patent, immunoglobulins or 
their immunologically reactive fragments were used as members of 
panels of binding ligands to obtain the characteristic profiles 
used in characterization and correlation. A modification of 
this technology, described in U.S. Patent No. 5,340,474, 
mentioned above substitutes panels of diverse paralogs for the 
antibodies and fragments used in the profiling panels. Paralogs 
•are defined as polymeric moieties preferably of MW less than 7.5 
kD composed of monomers with characteristics such that maximal 
diversity could be obtained across the panel members with a 
minimum number of paralogs. By maximizing diversity, the range 
of space /charge contours that characterize "chemical space" can 
then be achieved with relatively small numbers of compounds. 

As described in the above -referenced patents, such 
reference panels are useful in a number of contexts. The panel 
can be used to obtain a "fingerprint" that characterizes a 
particular analyte. The fingerprint can be used as an 
analytical tool to identify a particular substance much in the 
same way that an IR spectrum or NMR spectrum could be used. In 
addition, it was recognized that analytes that have similar 
fingerprints or similar features contained in their fingerprints 
have similar binding or reactivity properties in general or with 
respect to the property associated with the similar feature. 
Therefore, if, for example, a receptor of interest has a known 
ligand, other compounds that will bind to the receptor can be 
found by matching their fingerprints against the reference panel 
with the fingerprint obtained from the known ligand. Similar 
matching of complementary members of a binding pair can be 
obtained using inverse image sets wherein a fingerprint for a 
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ligand against a reference panel will match the fingerprint of 
the receptor against a set of compounds which is an inverse 
image of that reference panel . 

Still another application for which panels of reagents 
5 are useful is in determining analyte composition of a sample. 
This application is described in U.S. 5,338,659 incorporated 
herein by reference. The fingerprint obtained for an unknown 
sample is matched with predetermined fingerprints or profiles 
determined on standard known compositions. Certain 

10 computational techniques can be employed to facilitate this 

comparison as described in this patent. In this case, however, 
it is not generally thought that a wide range of binding 
capabilities will be required since the application is focused 
on compositions which contain analytes, generally with related 

15 structures, and means for correlating the fingerprints with the 
other inherent properties of the analytes themselves are not 
needed. Thus, in this case, it might be considered logical to 
use panel members which are not necessarily antibodies nor 
maximally diverse paralogs. 

20 In the present application, an additional method of 

identifying binding partners is described using a computational 
combination of results against a reference panel as a surrogate 
for a desired target. The reference panel illustrated is 
comprised of enzymes. It is thus found, surprisingly, that 

25 sufficient diversity of reactivity can be obtained to achieve 
meaningful results, even though the enzymes used in the 
illustrative reference panel were not designed by nature to have 
a vast multiplicity of binding activities (as are antibodies) . 
Neither were the enzymes expected to have the maximal diversity 

3 0 ascribable to a small number of panel members that was achieved 
through the design of paralogs. Nonetheless, by utilizing 
enzymes, even isoenzymes with similar activities, as members of 
the reference panels, a satisfactory surrogate can be achieved 
to predict binding of candidate ligands to targets, including 

35 targets entirely unrelated by any similarity of amino acid 
sequence to the enzymes that are panel members. It has thus 
been found that such enzymes or reference molecules in general 
should also be useful in the profiling and pattern-matching 
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methods described in the above -referenced patents and in the 
methods described herein. 



Disclosure of the Invention 

The invention utilizes what is, in effect, a surrogate 
5 for the target to screen an arbitrary number of potential 

ligands. First, a reactivity binding profile of the target with 
respect to a "training set" of compounds, preferably having 
characteristics which are systematically diverse, is prepared. 

The training set might include, for example, ten different 

10 compounds which will have varying degrees of affinity for the 
target. Thus, the target profile will show a set of varying 
affinities with these compounds. Rather than test additional 
candidate ligands with respect to the target itself, a 
"surrogate" is artificially created by testing the reactivity of 

15 this same set of ten training compounds against a reference 

panel of molecules to which the training set also shows varying 
degrees of reactivity. This might be called a reference panel. 
Each compound in the training set will therefore show a pattern 
of reactivities with respect to this reference panel . 

20 This results in a two-dimensional matrix wherein the 

level of reactivity of each member of the training set with 
respect to each member of the reference panel is recorded. The 
level of reactivity of each member of the reference panel with 
each of the training compounds is thus simultaneously recorded 

25 in an orthogonal dimension. 

Each member of the reference panel will, of course, 
show a different profile with respect to the training set than 
did the actual target. However, some computational combination, 
preferably a linear combination, of the these reference panel 

3 0 profiles will generate a profile which matches as closely as 
possible that obtained from the target itself. That optimal 
approximation constitutes a surrogate for the target . The 
formula which results from the computation with respect to the 
reference panel is used to estimate reactivities for newly 

35 tested compounds. Empirically, such surrogates have good 

predictive power when applied to ligands outside the training 
set. A library of ligand profiles against the reference panel 
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can thus be searched computationally with results comparable to 
a direct physical screen of the ligands . 

Thus, for each compound subsequently tested, 
reactivity against each member of the reference panel is 
obtained and the formula derived from the training set is 
applied to obtain a predicted value with respect to the target . 
Rather than directly testing the reactivity of a candidate 
compound with a target, it is possible instead to test its 
reactivity with respect to a panel of readily available 
reference receptors , apply the formula to the results, and 
predict what would have happened had the target itself been 
used. The larger the library of stored ligand profiles against 
a reference set, the larger the increase in efficiency for 
screening by surrogate . 

It has also now been found that nonimmunoglobulin 
proteins (some of which are naturally occurring, but regardless 
of how they are actually produced) can be used successfully to 
constitute a reference panel for use in profiling analytes, 
predicting binding capabilities of candidate compounds with 
respect to targets, as well as for the analytical purposes 
described in U.S. 5,338,659. Thus, panels useful in the methods 
of the present invention can be comprised entirely of such 
proteins as enzymes, T cell receptors, olfactory receptors, 
lectins, and artificially modified proteins containing arbitrary 
binding sites. The panels may also include antibodies or 
fragments thereof, or paralogs as members; however, in the 
panels useful in the methods of the present invention the 
non-Ig/nonparalog members must "enrich" the panel beyond the 
contribution of any immunoglobulin proteins and paralogs also 
contained in the panels, as described hereinbelow. 

In one aspect, the invention is directed to a method 
to determine the ability of a candidate compound to react with a 
target which method comprises providing a surrogate for the 
target. The surrogate is that formula representing a 
computational combination, preferably a linear combination, of 
at least 2 reference reactivity profiles, which best agrees with 
the empirical binding data of the target against the training 
set of compounds. The reference reactivity profiles represent 
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the reaction of each member of a panel of reference receptors 
with respect to a set of compounds, which set of compounds can 
be designated a "training set". The formula is then applied to 
the reactivities with respect to each of the members of the 
5 panel of reference receptors that is obtained for each candidate 
compound. The outcome of applying this formula mimics what 
would be found had the compound been tested directly with the 
target receptor. 

This aspect, of the invention thus relates to a method 
10 to identify a candidate reactive with a target, which method 
comprises : 

(a) providing a formula that represents a combination 
of the reactivity profiles of at least two members of a 
reference panel with respect to a first set of compounds, which 

15 formula calculates a predicted profile that best matches the 
reactivity profile of the target itself with respect to said 
first set of compounds; 

(b) testing the reactivity of said at least two 
proteins of the reference panel with respect to a candidate; and 

20 (c) calculating a predicted reactivity with respect 

to the target for said candidate by applying said formula to the 
reactivities determined in step (b) to estimate the reactivity 
of the candidate with respect to the target. 

A successful candidate is then identified and 

25 synthesized from the appropriate starting materials. 

Another aspect of the invention is a particularly 
preferred combination of a training set and panel. In this 
preferred matrix, each member of the reference panel has 
effectively an inverse image member in the training set of 

30 compounds. In this way, the number of reference panel members 
and training compounds is minimized by removing redundant 
overlaps. 

The invention is also directed to a database of 

* 

fingerprints obtained with respect to a reference panel. The 
35 database can be used for a variety of purposes as described 
below . 

In still another aspect, the invention is directed to 
methods to construct the reference panels of the invention. 
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In another aspect, the invention is directed to a 
method to characterize a single analyte, which method comprises 
contacting said analyte with each member of a panel enriched by 
or constituted by the above -described non Ig proteins which 
5 react in a multiplicity of differing degrees with said single 
analyte; detecting the degree of reactivity of said analyte to 
each of said members; recording said degree of reactivity of 

said analyte to each of said panel members; and arranging said 

recorded degrees of reactivity so as to provide a characteristic 

10 profile of said analyte. 

In another aspect, the invention is directed to panels 
containing non Ig proteins useful in the various pattern- 
matching methods and to physical embodiments of the fingerprints 
obtained by the four methods. 

15 In another aspect, the invention is directed to a 

method to identify a candidate, which candidate will be 
effective in reacting with a target, wherein said target has a 
known ligand with which it reacts, which method comprises: 
contacting said candidate with each member of the panel enriched 

20 by or constituted by the above -described non Ig proteins which 
react in a multiplicity of differing degrees with said 
candidate; detecting the degree of * reactivity of said candidate 
to each of said panel members; recording each said degree of 
reactivity of said candidate to each of said panel members; 

25 arranging said recorded degrees of reactivity so as to provide a 
characteristic profile of said candidate; comparing said profile 
to a profile analogously obtained of said ligand with respect to 
said multiplicity of panel members; wherein similarity of the 
profile of said candidate to the profile of said ligand 

30 indicates the ability of the candidate to react with said 

target. A substance identified as a successful candidate is 
then identified and synthesized from the appropriate starting 
materials. 

In a third aspect, the invention is directed to a 
35 method to select a candidate, from a multiplicity of candidates, 
that reacts specifically with a known target, which method 
comprises: providing a profile of reactivity of said target 
against a maximally diverse set of compounds; providing a panel 
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including non Ig proteins as described above which is an inverse 
image of said maximally diverse set; preparing a profile of the 
reactivity of the candidate to the inverse image panel; 
comparing the maximally diverse set profile of the target with 
the inverse image panel profile of the candidate; and wherein 
similarity of the inverse image panel profile with the diverse 
set profile indicates the probability that the candidate will 
bind to the target. A successful candidate is then identified 
and synthesized from the appropriate starting materials. This 
method can be "reversed" in that the choice of which substance 
is considered a "candidate" and which a "target" is arbitrary 
i.e., the target can be profiled vs. the inverse image panel and 
the candidate vs. the maximally diverse panel. 

In addition, the invention is directed to a method to 
15 determine the ability of a candidate to react with a target 

which method comprises providing a surrogate for the target, as 
described above, and including non Ig proteins in the reference 
panel . 



10 



20 



Brief Des cription of the Drawings 

Figure 1 is a flow diagram of the method to calculate 
the probability of a candidate binding to target using a 
surrogate . 

Figure 2 shows a preferred embodiment of the training 
set /reference matrix. 

25 Figure 3 is a flow diagram of the method to determine 

the profile of an analyte. 

Figures 4a and 4b show typical embodiments of 
fingerprints obtained by various invention methods. 

Figure 5 is a flow diagram of the method for comparing 
30 profiles of a ligand with a candidate compound. 

Figure 6 is a flow diagram of the method to compare 
inverse image profiles. 

Figures 7a, 7b and 7c represent distance distributions 
for profiles of 800 compounds determined with respect to 
35 reference panels of 5, 7 and 10 reference proteins, 
re spe c t i ve ly . 
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Figures 8a-8c are distance distributions for points in 
10 -dimensional space representing profiles of 50, 100 and 1000 
compounds, respectively, with respect to a panel of 10 reference 
proteins. 

5 Figures 9a- 9c show distributions for the profiles with 

respect to 10 reference proteins of various collections of 
compounds. Figure 9a is the same as Figure 8a which shows the 
distance distribution representing profiles of 50 random 
compounds. Figure 9b shows the distance distribution for 

10 profiles of 50 known pharmaceutically active compounds. Figure 
9c represents a similar distribution for 50 peptides of varying 
biological activity . 

Figure 10 shows the results obtained when a training 
set of compounds is tested with respect to a panel of reference 

15 GST isozymes to generate a surrogate for a target receptor. The 
results of testing a multiplicity of additional compounds 
against the panel of reference enzymes and applying the formula 
defining the surrogate is compared to testing the additional 
compounds directly against the target receptor. Gray scale 

20 indicates IC 50 values . 

Figure 11a shows the predictions and actual empirical 
data from Figure 10 as a scatter plot indicating high degree of 
correlation. Figure lib shows the residuals from Figure 11a. 

Figure 12 shows a list of 122 compounds and their 

25 symbols used as the compound library in the results obtained 
against an enriched reference panel of eight selected enzymes. 

Figure 13 shows the experimental and predicted ability 
of the compounds of Figure 12 to bind GRd and AdDH, as well as 
the characteristic profiles of these compounds against a 

30 reference panel where the first 12 compounds listed are the 
initial training set. An additional set of 10 training 
compounds used in the second iteration predictions are denoted 
by adjacent black bars, with a different set of 10 for each 
target . 

35 Figures 14a and 14b show correlation plots of 

predicted and experimental values according to the results shown 
in Figure 13 . 
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Figure 15 shows the correlation between fitted and 
experimental binding of a multiplicity of compounds against nine 
different targets . 

Modes of Carrying Out the Invention 
5 The method of the invention permits a large number of 

candidate compounds to be tested for their ability to react 
with, and in particular to bind to, a target without necessity 
for large amounts of the target per se. The target itself is 
required only in sufficient quantity and purity to generate the 

10 formula which creates the surrogate. 

As used herein, the term "target" includes, for 
example, molecules that reside on the surface of cells and 
mediate activation of the cells by activating ligands, but also 
is used generically to mean any molecule that binds specifically 

15 to a counterpart. One member of a specific binding pair could 
arbitrarily be called a "receptor" or "target" and the other a 
"ligand" . No particular physiological function need be 
associated with this specific binding. Thus, for example, a 
"target" might include antibodies, immunologically reactive 

20 portions of antibodies, molecules that are designed to 

complement other molecules, and so forth. Indeed, in the 
context of the present invention, the distinction between 
"target" and "ligand" is entirely irrelevant; the invention 
concerns pairs of molecules which specifically bind each other 

25 noncovalently with greater affinity than either binds other 
molecules. However, for ease of explanation, the invention 
methods will often be discussed in terms of target, such as an 
enzyme (again, simply a molecule for which a counterpart is 
sought that will react or bind with it) and "ligand" simply 

30 represents that counterpart (such as a low molecular weight 
inhibitor) . 

The Use of Surrogates 

In order to practice surrogate method of the present 
invention, the following elements are needed: 
35 First, a reference set of model targets against which 

measurable activity can be assessed. Various techniques for 
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determining reactivity of compounds with this set of reference 
targets are possible, and within the skill of the art as 
described above. It is important to emphasize that it is 
unnecessary that the reference panel members be in any way 
5 related by primary amino acid sequence or empirical chemical 
structure or by known biological function to the target for 
which they provide a model. For example, in the illustration 
below, various enzymes, including glutathione S-transf erase 
(GST) , are used as the reference receptors while the actual 

10 target is glutathione reductase (GRd) , aldehyde dehydrogenase, 
or a variety of other proteins. There is no previously 
discernible similarity between the enzymes of the panel and any 
of the targets at the levels of primary structure or of known 
enzymatic function. One of the advantages of the present 

15 invention is that the reference proteins can be quite different 
in known reactivity and in primary structure from the target, 
because the predictive information is present in their relative 
correlations with the target, not their homology. The reference 
panel may contain as few as 1, but preferably 2-50 and more 

20 preferably 8-25 non-Ig proteins; the total number of panel 
members can also be similarly described. 

Second, a training set of ligands representative of 
the compounds desired to be further tested with respect to their 
reactivities with the reference panel is required. If there is 

25 a library of compounds to be further tested, a multivariate 
clustering method can be used to determine representative 
compounds from the library, or similar to those in the library, 
for use in the training set. Similarly, compounds with 
maximally systematically varying properties can also be used. 

30 In general, this training set of compounds should include at 

least as many compounds as the number of reference proteins and 
preferably about 3 times that number. 

Third, there must be enough target available to test 
the training set empirically, although the target need not 

35 necessarily be pure. The target must be free of undesired 
interfering impurities, however. 

With these compounds and reference panels in hand, the 
profiles of each reference panel member with respect to the 
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training set and the profile of the target with respect to the 
training set can be obtained by physical measurement- A fourth 
requirement then is a fitting procedure to match the target's 
profile with a combination of the reference panel member 
5 profiles. In addition to techniques for linear regression, 

nonlinear regression methods can also be used for this purpose, 
including partly linear models as well as rule-based methods 
such as clustering by recursive partitioning. Indeed, any 
algorithms used in hemometric analysis or pattern recognition 

10 generally can be combined with the physical assay data, 

represented by fingerprints prepared as taught here, in order to 
classify compounds. Such mathematical techniques are well 
understood in the art, and result in the formula which serves as 
a surrogate for testing of further compounds. 

15 Application of the formula to the profile obtained for 

a newly tested compound with respect to the reference panel 
. results in an estimate of the ability of the newly tested 
compounds to bind target. Of course, this represents a 
probability and not an absolute. The predicted result amounts 

2 0 to a screening procedure to identify compounds with a high 

probability of binding the target (or not binding the target) . 

While one compound at a time can be tested with 
respect to the reference panel and the formula applied to 
estimate a target reactivity value, the most useful application 

25 of the method of the invention pertains to screening libraries 
of candidate compounds. Thus, quite frequently, a large number 
of candidate compounds is available and the method of the 
invention can be used to select those which do and those which 
do not bind the target. When the method is thus applied to 

30 libraries, the results from the newly screened candidates can be 
added, if desired, to the training set and the process repeated 
in an iterative loop. Thus, the original training set could be 
supplemented with selected compounds which are estimated to bind 
the target strongly and selected compounds which are estimated 

35 to bind the target only weakly or undetectably and these 

compounds used in addition to, or instead of, certain members of 
the training set to obtain the profiles with respect to 
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reference panel members and actual targets. The formula can 
then be recalculated taking account of these additional members. 

Further, not all profiles of the reference panel 
proteins with regard to the training set need be included, in 
5 the end, in the formula. That is, some of the coefficients for 
model receptor profiles in the linear combination may be zero or 
negative . 

The general approach to the use of surrogates is 
outlined in Figure 1. 

10 In Figure 1, a fingerprint database is first assembled 

according to the procedure shown in Figure 3 described below for 
a multiplicity of compounds against a representative reference 
panel. The reference panel itself will have been selected using 
preliminary data to include members that have the ability to, 

15 collectively, react with a wide range of compounds but wherein 
each panel member reacts with different sets of such compounds. 

When a suitable panel has been chosen, a training set 
is also selected from among the profiles for testing against the 
target . Each of the members of the training set is thus tested 

20 and the resultant with respect to target is obtained for each 
member of the training set. This amounts to a profile of the 
target using the training set as panel members. The 
fingerprints of the training sets can then be inverted 
conceptually, since the same data points are involved, to 

25 provide a profile of each member of the panel with respect to 
the compounds of the training set. These conceptually inverted 
profiles can be analyzed mathematically, for example, using 
linear regression analysis, to obtain a mathematical surrogate 
as shown in Figure 1 . 

30 The profile of any candidate, including candidates for 

which profiles are already available in the database, can be 
mathematically treated according to the surrogate to predict the 
reactivity with the target. Successful candidates can be 
identified using the surrogate -generated predictions and the 

35 successful candidates synthesized using the relevant starting 
materials. There is also a feedback loop which permits such 
predictions to be tested and revisions to the training set made 
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on the basis of these predictions leading to modifications of 
the surrogate . 

Illustrative Method 

The method of the invention can be further illustrated 
using a simplified hypothetical matrix, and a linear regression 
method of combination. 

The matrix set forth below represents a hypothetical 
matrix used to illustrate the generation of the relevant formula 
as surrogate. Across the top labeled MR1-MR5 are five panel 
members which represent panel members, such as enzymes used as 
reference model targets for the actual target receptor TR. 
Along the side, labeled TC1-TC5 are five training compounds 
which bind or otherwise react in varying degrees with each of 
the reference panel members The degree of reactivity is 
arbitrarily assigned a value on a scale of l-io where 10 
indicates high reactivity and 1 indicates low reactivity. 
Generally, a logarithmic scale of measured values is used. 



Sample Matrix 





MR1 


MR2 


MR3 


MR4 


MRS 


PR 


TR 


TCI 


6 


1 


1 


7 


2 


2 


2 


TC2 


2 


4 


2 


6 


2 


4 


4 


TC3 


1 


3 


8 


1 


5 


6 


6 


TC4 


5 


9 


10 


10 


1 


8 


8 


TC5 


9 


1 


10 


5 


9 


10 


10 



In these hypothetical results, profiles for each of 
the set of training compounds with respect to the reference 
panel are shown in the horizontal rows and profiles for each 
reference enzyme with respect to the training set of compounds 
are shown in the vertical columns. Thus, for example, for MR1, 
there is a moderately high level of reactivity with TCI, low 
reactivity with TC2, very low with TC3 , moderate reactivity with 
TC4 and very high reactivity with TC5. Thus, each of MR1-MR5 
has a particular profile of reactivity with regard to the 
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training set. On the right, marked TR, the target receptor 
shows a profile against the training set with monotonically 
increasing reactivities over the TC1-TC5 range, a pattern 
grossly different from any of the reference profiles. 
5 A formula is then generated by assigning weights to 

each of the elements of the five MR1-MR5 profiles to obtain a 
predicted target receptor profile that matches that actually 
obtained for the target. The weighing values will need to be 
the same for each element of the profiles. Thus, the weights 

10 applied to the TCI element with respect to how the values from 
MR1-MR5 are counted have to be the same as those applied to TC2 . 
Ultimately the algorithm will be of the form A(MR1) + B (MR2) + 
C(MR3) + D (MR4 ) + E (MRS) = the value assigned to the predicted 
value according to the surrogate, shown in the table as PR. 

15 Each of the coefficients A-E will have a numerical value; some 
of the coefficients may be zero. This same equation, with the 
• same values of A-E will be used to calculate the predicted 
reactivity with the target receptor for any individual candidate 
compound . 

20 In the above example, A=+2; B=+3 ; C=-l; D=-2; E=+l. 

Here the coefficients allow a perfect match between the 
Predicted Receptor (PR) profile and the target receptor (TR) 
profile with respect to the training set. In general, and if 
more compounds are included in the training set a perfect match 

25 may not be possible; but the closest approximation obtainable is 
useful to the same end. 

Thus, for any new compound, a prediction for 
reactivity with target is obtained as follows: A profile that 
provides reactivity values for MR1-MR5 is obtained. The values 

3 0 obtained are then substituted into the formula set forth above, 
with the predetermined values of A-E. A predicted value is 
calculated. Thus, a new candidate compound, which gives a 
profile with values of MR1=8, MR2=9, MR3=4, MR4=7 and MR5^5, 
will be evaluated according to the formula: 

35 ( + 2) (8) + ( + 3) (9) + (-1) (4) + (-2) (7) + ( + 1) (5) = PR 
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to provide a predicted reactivity value of 30. This 
demonstrates that the method can predict higher reactivity than 
available in the training set. Confirmed high reactivity 
compounds can be added to the training set to refine the 
5 formula. 

Examples 3 and 4 set forth. below indicate that this 
general approach is successful in predicting the reactivity of 
any candidate compound with a target; accordingly, no further 
supplies of target receptor are required in order to test an 

-tG — arbitrary number of compounds. 

In a preferred embodiment of the original matrix, both 
the reference panel and the training set are maximally diverse 
and represent inverse images. This is illustrated in Figure 2 
which shows a hypothetical matrix of reference panel members and 

15 reference binding agents. As illustrated in the figure, 

reference panel member 1 and set member 1' interact strongly; 
reference panel member 2 and set member 2' do so; reference 
panel member 3 and set member 3', etc. There is relatively weak 
interaction between, say, set member 3' and reference panel 

20 member 2 or reference panel member 1. In effect, the reference 
panel and the training set represent inverse images . 

Kits can be prepared which include, in separate 
containers, each of the members of the training set, each of the 
members of the reference panel, and the target, along with 

25 reagents for testing their reactivity. 

Inclusion of Non la Proteins 

Performance of the above surrogate method resulted in 
the surprising discovery that fingerprint matching to identify 
compounds with desirable properties, such as the ability to bind 

30 to a desired target, the ability behave as an enzyme inhibitor, 
a specific pharmacological activity, and so forth, can be based 
on panels which are substantially enriched by proteins that are 
neither immunoglobulins and their fragments nor specifically 
designed maximally diverse paralogs. Surprisingly, a range of 

35 complementarily or other interactive ability sufficient to cover 
substantially all of "chemical space" can be achieved by 
employing naturally occurring proteins such as enzymes, lectins, 
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T cell receptors, olfactory receptors and the like, or by 
employing proteins which are modified forms of naturally 
occurring proteins. By choosing a suitable set of these 
proteins, a sufficient range of reactivity can be obtained to 
5 provide enhanced fingerprints in these contexts. Thus, panels 
enriched by or constituted by nonimmunoglobulin proteins serve 
to provide suitable reference sets of data points for obtaining 
a characteristic profile of an individual substance. The 
profiles can be manipulated in a number of ways as further 
TTJ described below. ~~~ 



It would be possible and is within the scope of the 
invention to construct panels which contain as members not only 
these proteins but also antibodies and/or paralogs or other 
arbitrarily chosen quantitative reactivity events. When the 

15 word "reactivity" is used in the present application, it refers 
to noncovalent interaction between the stated participants. In 
a sense, then, "reactivity" is substantially similar to 
noncovalent binding. Such binding may or may not be coupled 
with catalytic or allosteric responses. 

20 However, the panel must at least be enriched by the 

alternative proteins. A protein "enriches" the panel if its 
membership in the panel does any of the following or some 
combination : 

(a) expands the coverage of the panel over chemical 
25 space (see below) ; 

(b) increases the average distance between 
fingerprints of different compounds in the 
library (see below) ; 

(c) decreases the number of reference panel members 
30 required to obtain a given number of principal 

components (see below) . 

(a) It is, of course, desired to cover all of 
chemical space. However, 90%, but preferably 95%, coverage is 
generally satisfactory. "Covering" chemical space means that 

35 all compounds tested against the panel show at least some 

reactivity with at least one panel member, and preferably 3-5. 

(b) The "distance" between fingerprints or profiles 
can be best understood by the device of assigning each profile 
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to a point in n-dimensional space where the reactivity with 
respect to each of n reference panel members is plotted 
individually in n dimensions. The distance between the points 
is then the distance between the profiles. It is readily seen, 
5 however, that this is just a convenient way to quant itate 

differences between profiles; any other method for quantitating 
profiles could also be used, such as recursive partitioning of 
data as in a branching tree clustering hierarchy. 

(c) "Principal components" relates to degree of 

10 correlation in reactivity in accordance with standard 

multivariate statistical usage. For example, if there are 10 
members in the panel and all react nearly uniformly with a given 
set of compounds, they furnish only one principal component. If 
each possible pair of panel members shows no correlation in 

15 binding reactivity to a given set of compounds, there are 10 
principal components . 

Thus, the proteins included in the panels used in the 
invention method must enhance or enrich the panel in at least 
one of the foregoing ways. The panels useful in the invention 

20 must include at least one non-Ig protein that enriches the 

panel. Preferably 10% of the members are non-Ig proteins, more 
preferably 20% and most preferably *50% or more. 

The panels may consist entirely of non-Ig proteins or, 
indeed, entirely of enzymes, or entirely of lectins or entirely 

25 of T cell receptors or entirely of olfactory receptor proteins 
or entirely of receptor proteins in general or may be composed 
of mixtures of these. Taking as an example panels where the 
inclusion of enzymes is the focus, typically the panels will 
contain at least 2 enzymes, preferably 3 enzymes, more 

30 preferably 4-6 enzymes, and most preferably 7-25 enzymes. It 
has been found that employing no more than 15 enzymes can still 
yield acceptable results over virtually all of chemical space; 
however, there is no arbitrary upper limit to the number of 
enzymes in the panel other than the practical consideration that 

3 5 the law of diminishing returns sets in fairly clearly above 
numbers in this range. Similar comments could be made 
concerning any other particular class of proteins mentioned 
above . 



BN80OCCD: <WO_96ieseSA1JU>. 



WO 95/18969 PCT7US95/00081 

- 21 - 

The proteins in the panel can preferably be chosen as 

follows : 

An iterative process is used to select the members of 
any panel for use in fingerprinting. A few candidate panel 
5 members including non-Ig proteins are arbitrarily chosen and 
fingerprints for any arbitrary set of compounds are obtained. 
Comparisons are made between the fingerprints . Any method of 
comparison could be used, but some particularly effective 
methods are described hereinbelow. Whatever the method of 

10 comparison, compounds which have very similar fingerprints are 
clearly redundant members of the library of compounds for this 
purpose and only one of the compounds in such a group should be 
retained in the selection set. The remaining fingerprints are 
then again compared for similarity, only this time an inverse 

15 profile is obtained for each of the reference panel members with 
respect to the remaining compounds in the selection set. Now it 
becomes possible to discard panel members which provide similar 
inverse profiles with respect to the compound library. Thus, if 
three candidate members in the panel seem to provide similar 

20 reaction patterns across the compound library tested, only one 
of the members is retained in the panel . 

If the panel, including non-Ig proteins, which has 
been thus reduced for redundancy continues to generate viable 
fingerprints for all new compounds and if the new compounds do 

25 not reveal any further redundancy in the panel, then the panel 
is satisfactory. However, if the panel fails to provide a 
meaningful fingerprint for any new compound, additional members 
need to be added to the panel, although it becomes harder and 
harder to find a new member which provides distinct patterns as 

30 compared to those already present. The screening for new 

members in the panels is preferably conducted on compounds that 
were not detected with the members already present . The new 
member candidates are then evaluated on a maximally diverse set 
of the compounds already tested. The ideal panel provides high 

3 5 coverage with high independence and a small number of members, 
preferably under 100, more preferably under 25 and most 
preferably under 15 . 
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It has been found that among 100 enzymes of widely- 
varying function, 12 of them provide 95% coverage against 1000 
compounds from a wide variety of chemical classes. The 12 
enzymes are independent since about 9 statistically meaningful 
principal components are needed to describe the 12; if they were 
totally independent, 12 would be needed. 

Arrangement of the Panel 

The members, including non-Ig proteins, that comprise 
the panel must be physically embodied in such a way that an 
individual result for each member can be retrieved and recorded 
so as to construct the profile. Of course, it is possible 
simply to react each member independently in an individual 
reaction container with the relevant analyte; to record the 
results of each container individually; and to manually 
construct the profile that results. More convenient alternative 
approaches involve displaying the panel members in an orderly 
fashion on some type of solid support, such as a microtiter 
plate or other support with multiple test regions and to scan 
the regions for the individual results. The scan can assess the 
results in each region sequentially or simultaneously using 
known technology. 

In general, the reactivity of the analyte with each 
test region or container is assessed in terms of the binding 
affinity of the analyte to the panel member contained therein. 
The art is replete with the methodologies for detecting the 
degree of binding of one substance to another. In a prototypic 
approach, one partner, in this case the panel member, is bound 
to solid support and the other partner, in this case the 
analyte, is labeled using radioisotopes, fluorescence, enzymes 
and the like, and after contact of the analyte with the 
supported panel member, the support is. washed free, if 
necessary, of unbound analyte and the amount of label measured. 
In the alternative, the binding affinity can be measured by 
competition between the analyte and a labeled competitor. One 
method of such competitive binding described in the above - 
referenced patents involves competition between the analyte and 
a diverse mixture of labeled compounds which mixture is 
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sufficiently diverse that the mixture binds uniformly to every 
member of the test panel so that the diminution in label 
directly gives a measure of the degree of binding for the 
competitor analyte. Methods are also available to detect the 
degree of binding between two substances in homogeneous media as 
in, for example, the EMIT technology. In all of these methods, 
any conventional method of labeling may be used. Among 
preferred methods is the use of fluorescent label competition, 
for example using fluorescence polarization. The invention does 
not concern specific methods of detecting the degree of binding, 
and any conventionally used procedure for measuring the binding 
affinity between analyte and the member of the panel can be 
used. 

It is preferable to use assay methods with a wide 
dynamic range. Quantitation of affinity by IC 50 for inhibition 
of substrate turnover, or other competitive binding events, can 
often be measured over more than five log units of potency, for 
example . 

Profile Determination 

2 0 Determination of a characteristic profile provides the 

basic tool for the matching techniques of the invention. Each 
profile or fingerprint is determined by measuring the individual 
reactivities, such as binding affinities of the analyte for each 
member of the panel. The reactivities are then recorded in an 
25 orderly arrangement so as to provide this characteristic 
profile . 

Figure 3 is a flow chart showing the steps in 
obtaining the characteristic profile for an analyte. 

First, the analyte is contacted with each panel member 

3 0 (panel member i) in a panel of n members. For each of these 

contacts, the reaction of the analyte with the panel member is 
detected and measured. Then, the extent of reaction is recorded 
to obtain a data point for reactivity associated with each of 
the n members of the panel . Then the recorded data points are 
35 arranged in an orderly manner to obtain the profile. One 

convenient way of arranging these data points is to plot each 
reactivity in one of the dimensions of n-dimensional space. 



10 
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However, other means of recording the profiles are also 
available. 

Figures 4a- 4b provide examples of the manner in which 
such profiles can be recorded. In Figure 4a, the analyte is 
directly tested with respect to binding affinity for a 
theoretical panel containing ten enzymes. The results are 
recorded in the form of a bar graph. Alternatively, as shown in 
Figure 4b, the results may be tabulated in terms of arbitrary 
categories of binding strength represented by a spectrum of 
white-black to indicate degree of affinity. For computer 
analysis, numerical values are most useful, although hard to 
interpret by visual inspection. 

Once the characteristic profile of an analyte is 

recorded, either as shown in Figures 4a-4b, or in other graphic, 

numeric, or electronic form, it can be used for a variety of 

purposes. One clear purpose would be simply to characterize the 

analyte in order to be able to match the profile with that of an 

i 

unknown compound. The profile can also be used to analyze 
concentration of the analyte in a sample, including samples 
which contain mixtures of analytes. The profile can also be 
utilized to compare the binding capacity of a candidate 
substance to that of a ligand known to bind to a target. This 
can be achieved through direct matching, or through matching of 
the profile with that of a receptor using inverse image panels, 
as described below. 

Pattern Matching to Identify Desired Reactivities 

One application of the panel results in the 
identification of diagnostic features of molecules or 
"pharmacophores" that interact with receptor targets. The 
pattern-matching techniques are precisely the same as those 
described for panels containing antibodies or paralogs as set 
forth in the above -referenced patents 5,300,425 and 5,340,474 
and in U.S. 5,338,659. 

Pattern matching can be used to identify compounds 
which have a desired activity physiologically. For example, 
compounds providing fingerprints against the panels of the 
invention which are similar to those of compounds that have 
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antiinflammatory activity can be predicted to have 
antiinflammatory activity. The matching techniques can vary, 
but those described in the above-mentioned U.S. 5,338,654 are 
particularly useful. 
5 Figure 5 shows a direct method of identifying a 

substance which will be successful in binding a desired target. 

As shown, a profile is obtained for the candidate in a 
manner similar to that described above for an analyte in 
general. The same steps, using the same panel members, are 

10 performed with respect to a ligand known to bind to the desired 
target. Thus, the profile of the candidate substance and that 
of a ligand are obtained. These profiles are compared, for 
example, in the manner described herein by determining the 
distance between the points generated by plotting the 

15 reactivities against the panel members in n-dimensional space, 
and a candidate which has a profile similar to that of the 
ligand (i.e., for example, close to the position of the point 
representing the profile for the ligand in n-dimensional space) 
is identified as a successful candidate. The successful 

20 candidate is then synthesized using the appropriate relevant 
starting materials to obtain the desired substance. 

An alternative approach is to match profiles 
determined for the candidate substance and the desired target 
which are obtained in respect of inverse image panels. This 

25 approach is outlined in Figure 6. 

An inverse image set refers to a set of members each 
of which is complementary to a member of the reference panel 
described above. Figure 2 will be helpful in connection with 
the following description. Figure 2 shows a reference panel 

30 where the representative molecules have particular defined 

shapes numbered 1-n. An inverse image panel would correspond to 
a set of molecules that is complementary to these shapes shown 
as l'-n' in the figure. As stated above, such an inverse image 
panel would form an ideal training set in constructing the 

3 5 surrogates of the invention. It can also be constructed 

deliberately for use in the pattern-matching techniques to be 
described. The members of the inverse image panel are called 
"reference complements" because of their complementary shape. 
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Thus, for example, reference complement 1' exactly fits and 
binds reference panel member #1; reference complement 4' exactly 
binds and fits reference panel member #4, and so forth. The 
construction of inverse image panels is also described in U.S. 
5 Patent 5,300,425, 

The general pattern-matching procedure relevant here 
is outlined in Figure 6. 

In Figure 6, a profile of the candidate compound is 
obtained in a manner similar to that of Figure 3 treating the 

10 candidate with each panel member and obtaining a profile as 
shown in the left-hand column. The profile of the desired 
target is obtained with respect to each reference complement of 
an n-member set presenting the inverse image of the reference 
panel, as shown in the right-hand column. Again, the profiles 

15 are compared and similar profiles are identified to obtain a 
successful candidate substance which will bind to target. The 
successful candidate substance is then synthesized from the 
appropriate starting materials. 

Of course the inverse image panels could be reversed; 

2 0 the profile of the target is obtained with respect to the 

reference panel and that of the candidate with respect to its 
inverse image reference complement panel . 

Thus, no matter how complex its structure, if the 
candidate compound has a structural feature which effects its 

25 binding to a member of the reference panel, such as the 

arrowhead configuration designed to fit the triangular -shaped 
cavity shown for reference panel member #1 in Figure 2, it will 
bind with a target that has a surface feature (again, no matter 
how complex the remainder of the molecule) which resembles the 

30 triangular cavity shown in reference panel member #l in Figure 
2. Of course, this feature will cause a substance to which it, 
itself, binds by virtue of this feature to bind to reference 
complement 1' in the inverse image panel. Because of this 
common feature, then, the profile of the candidate with respect 

35 to the reference panel will match that of the target with 
respect to the inverse image panel. Of course, because the 
methods of invention operate on empirical fingerprints, it is 
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unnecessary to know what the complementary motifs are in terms 
of their molecular structure. 

U.S. 5,338,659/ referenced above and incorporated 
herein, discloses a particularly efficient approach to making 
5 comparisons between profiles. This approach is to plot the 
obtained profiles or fingerprints, in n-dimensional space, 
wherein n is the number of members of the relevant panel and the 
location of the point in each dimension is a function of its. 
reactivity with each panel member. The proximity of the points 

10 representing the unknown and any of the predetermined profiles 
in n-dimensional space represents the similarity of their 
compositions. Multiparametric statistical techniques can also 
be employed to define which of the n dimensions have the 
greatest information content relative to the assay so as to 

15 permit a selection of the minimum number of characteristics or 
dimensions to be measured. 

In order to use the profiles as tools in predicting 
properties of test substances or in other pattern-matching 
applications, regardless of the specific pattern-matching 

20 techniques used, the reference panel should be capable of 

covering at least 90% of the chemical space, and should provide 
an average distance between fingerprints of all pairs of at 
least about three times the noise level generated by replicate 
determination of profiles for a single compound. In addition, 

25 the fingerprint provided by the reference panel should provide 
at least five principal components with respect to the range of 
sm^.l! organic compounds that are available commercially. For 
example, this range is typified by any set of approximately 
1,000 compounds among those available from the Aldrich Catalog 

3 0 of Fine Chemicals . 

Use of Surrogates 

The panels are also used to create surrogates for a 
desired target in order to evaluate binding of candidate 
compounds as described above . 

35 Applications 
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Applications of these pattern-matching processes with 
respect to the profiles or fingerprints are manifold. For 
example, it is possible, because of their ease of synthesis or 
because of their native occurrence, to obtain peptides or 
5 proteins that behave in biologically important ways. However, 
peptides and proteins are not attractive as drugs as they cannot 
easily be orally administered and metabolized and present 
problems in manufacturing and storage; small molecules are 
preferred. By matching profiles, either by direct pat tern - 

10 matching, inverse panels, or surrogates, suitable small molecule 
substitutes can be found. 

Another important application is the prediction of 
toxicity in candidate drugs. Comparing the appropriate aspects 
of the fingerprint of the candidate drug with features of 

15 fingerprints of known toxins permits such prediction. Likewise, 
construction of surrogates for proteins similar in sequence or 
function to target allows side effects due to cross-reactions to 
be estimated in advance of animal testing using only trace 
amounts of the related protein. 

20 Still another application of the profiles of the 

invention and their correlation relates to providing parameters 
for improving the three-dimensional models of spatial 
arrangement of pharmacophores obtained by conventional computer 
modeling. Comparison of the fingerprint for a particular 

25 candidate compound, whose three-dimensional structure is to be 
compared with an idealized description of an appropriate ligand 
(the pharmacophore) , to fingerprints of compounds having related 
activities provides substantial additional empirical information 
which can permit construction of more accurate three-dimensional 

30 representations of peptides or other macromolecules subject to 
conf ormat ional var iat ion . 

The techniques of the invention also permit the 
reduction of large libraries of compounds to smaller sets that 
will, nevertheless, contain the compounds most likely to have a 

3 5 desired biological activity. The reduced size of the library 
permits more sophisticated tools to be applied to prediction of 
the affinity of the compounds in the reduced library for a 
target. Thus, because the size of a library is reduced, 
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extensive conformational analysis of the ligand in the active 
site as well as of conformational changes of the active site in 
the presence of the ligand can be studied for the library 
members. This also permits a more accurate analysis of the 
5 electrostatic interactions between the ligand and the binding 
site, including solvation effects which are related to 
desolvation of the binding cavity and the ligand when these 
interact. The reduced library enabled by the invention is 
considerably smaller than that generally used in three - 

10 dimensional databases, allowing proportionately more 
computational effort to be expended on each compound. 

The most general application is simply to provide the 
maximum functional diversity for a given size of chemical 
library; this chemical library provides a core set for 

15 screening, a core set for computer screening, training sets, 
generally, and chromatographic ligands. This application is 
especially useful applied to a combinatorial library, in which 
large numbers of quite similar compounds are typically found. 

The utility of the pattern-comparison approach has 

2 0 been successfully shown to identify nonsteroidal 

antiinflammatory drugs (NSAIDs) . Many NSAIDs have been selected 
based on their ability to inhibit cfyclooxygenase (COX, also 
known as prostaglandin synthase) which catalyzes the first step 
in the synthesis of prostaglandins, as well as on their activity 
25 in animal models. A second cyclooxygenase , COX-II has recently 
been discovered which is an isoenzyme of the originally known 
COX- I. COX-II is largely restricted to cells of the immune 
system and is believed more important than COX- I in 
inflammation . 

30 As set forth in more detail in Example 2, fingerprints 

were obtained for several hundred compounds, including two 
NSAIDs using the protein panels of the invention (containing 
8-10 proteins) . Examination of the fingerprints of these two 
compounds showed a common feature which proved to be shared with 

3 5 several additional known NSAIDs for which profiles or 

fingerprints were subsequently obtained. The fingerprints for 
the several hundred compounds already tested were then searched 
for the presence or absence of this feature. Twelve compounds 
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were found and these were tested for their ability to inhibit 
COX-I. Two compounds showed moderate and one measurable but lo* 
ability in this respect, although no NSAID activity had 
previously been reported for these compounds . 

The panel of proteins was then optimized as generally 
described above and used to evaluate a group of structurally 
diverse compounds containing seven known COX inhibitors and six 
inhibitors of other targets. The fingerprints obtained 
permitted completely accurate prediction of whether or not the 
compound was a COX inhibitor, although the proteins in the panel 
did not represent any proteins which were related to COX either 
by homology or by enzyme activity. 

Fingerprint Databases 

The reference panels of the invention, and reference 
panels generally, can be used to generate a fingerprint database 
which contains the fingerprints of a library of compounds in 
physically stored form to permit their retrieval. This form may 
be either a "paper" database, or is preferably in computer 
readable form. The database will contain the fingerprints of 
generally over 1,000 compounds with respect to a panel of 
proteins or other members wherein tAe number of panel members is 
less than three times the number of principal components 
represented in the panel . The compounds will represent a range 
of binding affinities for the panel members which is greater 
than three logs represented as IC 50 s. In the selected database, 
more than 95% of the compounds will provide fingerprints which 
are visible - i.e., are greater than the noise distance from the 
origin and will have mean separation from their nearest 
neighbors of more than three times the noise distance . 

These databases are useful in a variety of contexts. 
By applying multivariate statistical methods, equally diverse 
subsets can be obtained so that it can be verified that a subset 
selected from the database is of equal interest to the diversity 
represented by an alternative subset obtained by another method. 
Multivariate statistics can also be used to select a subset of 
maximal diversity for a defined size of the library; for 
example, if the defined size is five times the number of members 
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of the reference panel, it can be used as a training set, as 
described above. The database can also be used as a source for 
a diverse set of chromatography ligands. 

The following examples are intended to illustrate but 
not to limit the invention. 

Example 1 

Factors Determining Minimum Requirements 
for Surrogate Construction 

In order to construct a surrogate, both the reference 
panel and the training set must be adequate. In order to obtain 
successful candidate compounds for a desired property, the 
library must be adequate as well. 

Confirmation that the reference panel contains an 
adequate number of properly chosen proteins can be accomplished 
by obtaining an X-Y plot of' the distance between points in 
n-dimensional space (X axis) versus the frequency of this 
observed distance (Y axis) (distance distribution) . It will be 
recalled that each point in n-dimensional space represents the 
profile obtained for a single compound from the compound library 
with respect to a reference panel of n members. The height, 
shape, and maximum span of this distance distribution provides 
information as to the adequacy of the panel and the library. 
Ideally, a Poisson distribution should be obtained where the 
maximum of the distribution is at a high value of the distance 
between pairs. 

Figures 7a, 7b and 7c represent the distance 
distributions for the same set of compounds with respect to 
reference panels containing 5, 7 and 10 proteins, respectively. 
It is seen that when only five proteins are used in the panel, 
the shape of the distribution is somewhat irregular and the most 
frequent distance between points is relatively low. However, 
when the number of proteins in the panel is increased, a more 
regularly shaped Poisson distribution emerges with a larger 
distance between points at its maximum. The number of members 
in the panel is adequate when further addition of members fails 
to improve the position and shape of this distribution. 
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Conversely, Figures 8a -8c reflect progress toward 
achieving an ideal distribution by a simple increase in the 
number of randomly chosen compounds in the compound library. 
The plot of pair-wise distances among compounds in a chemical 
library should provide a random distribution of distances if the 
collection of compounds is complete. If there are 
discontinuities, the collection is incomplete. In addition, 
large values of the maximum distance between members of a pair 
indicate more diversity in a set of compounds. This is 
illustrated in Figures 8a- 8c. Figure 8a shows the frequency vs. 
distance plot for points representing fingerprints determined 
against a set of ten reference proteins for 50 compounds 
selected at random. The data do not result in a Poisson 
distribution and the maximum span of the distance is slightly 
over eight units. Figure 8b shows similar results when the 
fingerprints of 100 compounds are included; the distribution has 
become more regular and the maximum span has increased to 
approximately 12 units. When fingerprints for 1000 compounds 
were obtained and compared, the maximum separation between the 
points in n-dimensional space reaches 15 units and the 
distribution assumes the typical Poisson shape (Figure 8c) . 

Similar comparisons can be used to evaluate the 
adequacy of smaller numbers of putatively more representative 
compounds, for example, to evaluate the adequacy of 
combinatorial libraries comprised entirely of peptides. Figure 
9b shows the distance distribution for 50 commercially available 
drugs. Comparing this distribution with that shown in Figure 9a 
(same as Figure 8a) for 50 structurally diverse random compounds 
reveals that the distributions are quite similar. However, when 
these distributions are compared to that obtained for a library 
of peptides ranging from dipeptides to 32-mers, as shown in 
Figure 9c, the portion of the space spanned is more than a unit 
smaller. This leads to the conclusion that peptide libraries 
per se may be inadequate to represent all of chemical space. 

The character of the distance distribution can also be 
used as a measure of the diversity of a particular set of 
candidate compounds, for example substances available as 
chromatographic ligands. Using the distance distribution as a 
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criterion, a minimum number of ligands may be supplied to offer 
the widest possible spectrum of separation efficiency. In other 
words, such distance distributions can be used to verify the 
maximal diversity of panels of chromatographic ligands 
5 constructed as described in U.S. Patent 4,963,263, incorporated 
herein by reference, or to select nonpolymeric compounds to 
serve as diverse chromatographic ligands. 

Example 2 
Discovery of Additional NSAIDs 

10 A data base of fingerprinted compounds which included 

fenoprofen, flufenamic acid, ibuprofen, endoprofen, ketoprofen, 
mefenamic acid, naproxen, piroxicam, and sulindac was prepared, 
A panel of proteins was prepared which were commercially 
obtained or expressed recombinantly in E. coli and purified. 

15 All of the proteins were enzymes in the initial panel and an IC 50 
was determined in an enzymatic assay. A revised panel included 
other proteins and binding could be determined by fluorescence 
polarization. None of these proteins had any homology to the 
target of the NSAIDS, cyclooxygenase . 

20 Verification of predicted COX inhibitors was done by 

assessing COX activity in the presence and absence of the 
fingerprinted compound. Both COX- I from ram and COX- II from 
sheep were tested. The assays were conducted by incubating the 
enzyme at 3 7 °C in 0.1 mM arachidonic acid contained in 0 . 1 M 

25 TRIS, pH 8.0 with 20 mM phenol. The reaction was stirred 

vigorously to maintain a significant concentration of dissolved 
oxygen for 3 minutes. The reaction was then stopped by addition 
of 5 mM citric acid and samples were diluted. The PGE 2 
concentration was measured by EIA using a standard kit from 

30 Caymen Chemicals. 

The first several hundred compounds tested against the 
initial panel of 10 enzymes contained ibuprofen and 
indomethacin, two known NSAIDs. The 10 enzymes in this panel 
are those shown herein in Figure 10 and listed in Example 3 

3 5 below. Both shared common features in their fingerprints that 
were tentatively considered to be diagnostic of an NSAID. In 
evaluating the remaining compound fingerprints, 12 additional 
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compounds were selected which shared this feature. These were 
evaluated for their ability to inhibit COX-I and COX-II. Two 
COX- 1 inhibitors were found with moderate affinity and one COX-I 
inhibitor of low affinity. Thus, nine compounds which had been 
5 tentatively identified did not inhibit these enzymes, but two 
new leads were found without screening the whole library against 
COX. These novel leads are significantly different in structure 
from the known NSAIDS: ibuprofen and indomethacin. 

The reference panel was then revised to include 
10 enzymes which enriched the panel by expanding the range of 
chemicals that can be fingerprinted, and by increasing the 
average and maximal distances between fingerprints. The 
proteins in the panel are those listed on Figure 13 and are 
listed in Example 4 below. The compounds were ref ingerprinted. 
15 Of the nine compounds originally selected that then did not 

inhibit COX, only two remained putatively similar to the NSAID 
profile against this revised panel. 

The revised panel was used to fingerprint a group of 
13 unidentified compounds and the fingerprints were compared to 
20 the NSAID consensus fingerprint obtained from ibuprofen and 

indomethacin. When compared, seven of the compounds exhibited 
features which predicted they would inhibit COX and six led to 
the prediction that they would not. The fingerprints accurately 
identified flosulide, phenylbutazone, pirprofen, prinomid, 
25 oxindanac, oxindanac analog, and diclofenac as inhibitors of COX 
and the compounds chlordiazepoxide, maprotiline, imipramine, 
metoprolol, and pentopril as noninhibitors . Among the predicted 
noninhibitors was also included a diclofenac prodrug which 
itself does not inhibit this enzyme. 

30 Example 3 

Construction of a Surrogate 
In this example, the reference panel members whose 
profiles will be obtained with respect to a training set of 
compounds were isoenzymes of glutathione-S-transf erase (GST) . 
35 The reference panel containing ten such isoenzymes is shown at 
the top of Figure 10. The target in this example was 
glutathione reductase (GRd) shown at the right. The first 20 
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compounds listed on the left were used as a training set and, 
when tested for binding to glutathione reductase, generated the 
profile marked GRd at the right. In this "gray scale", the 
darker the square, the more tightly the compound is bound; the 
5 lighter, the less tightly bound. The list of compounds and 
abbreviations is provided at the left of Figure 10. 

For the reference panel, GSTs Al-1, Pl-1, Mia- la and 
M2-2 were provided as recombinant human enzymes; Rl-1, R8-8 are 
rat enzymes of the alpha class; Rl(25)-8 is a site-directed 

10 mutant of R8-8. HF2 and HF3 are house fly GST enzymes purified 
by hexyl -glutathione affinity chromatography from cell lines 
provided by M. Syvanen at UC Davis; Schistosome GSTS1 is 
available from Pharmacia as part of a fusion protein cloning 
vector. Yeast glutathione reductase was purchased from Sigma. 

15 In order to test the degree of binding between GSTs 

and the compounds on the left of the table, five serial 5- fold 
dilutions from 250^M to 0.4/iM were tested and the 50% inhibition 
concentration (IC 50 ) was calculated from a curve fitted to the 
data. For compounds with an estimated IC 50 below 0.4/zM, 

2 0 additional dilutions were tested until the true IC 50 was 

bracketed. Four of the GSTs and 20 compounds were selected as 
maximally diverse. The IC 50 s are indicated in the figure on a 
scale of from less than 0.4jxM; less than 2.0£M, less than 
10.0/zM, less than 50/iM, less than 250/xM, and less than 1000/xM. 
25 Thus, IC 50 s of less than 0.4/xM would appear black on this scale; 
those with IC 50 s of less than lOOO^M would appear white. 
Intermediate values are varying shades of gray. 

The column marked "Fitted Predicted Values" in 
Figure 10 is obtained by a linear combination of the results for 

3 0 the four enzymes used in the panel of reference receptors tested 

against the 20 compounds that come first in the chart. This 
same fitting combination was then used to predict GRd- binding to 
the remaining compounds . The predicted results are compared 
with the actual results against target on the right-hand columns 
35 of the figure. A good correlation is obtained; the regression 
coefficient is 0.8 with a dispersion factor of 0.7, as shown in 
Figures 11a and lib; this is more than adequate for making 
predictions on new compounds. Figure 11a shows the data for the 
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80 test compounds of Figure 10 not used in the fitting procedure 
and Figure lib shows the residuals (experimental -predicted) from 
Figure 11a. 

The mathematical form for the linear regression is: 

n 

log(IC 50 ) /Vr = L <^log(lc 50 ), Rj . (1) 



As shown in this formula, the IC^ of compound i is 
measured against target T or reference protein Rj weighted by a 
fitted coefficient C Rj . 

The successful correlation obtained above is 
surprising since the GRd derived from yeast is a NADPH dependent 
protein which has a different enzymatic function from GST. 
These enzymes share no sequence homology, and comparison of the 
crystal structures of GST and GRd reveals no tertiary structural 
similarities. The common use of glutathione does not appear to 
contribute to the correlation since the six peptide variants of 
glutathione which do bind various GSTs do not bind particularly 
well to GRd. 



Example 4 

Improved Referen ce Panel /Compound Library Combinations 
The general procedure set forth in Example 3 was 
followed but using a different reference panel and expanded 
compound library. 

An initial set of eight proteins was chosen by 
preliminary screening of about 100 proteins generally expected 
to display a broad cross-reactivity against small organic 
molecules . The eight panel members were chosen based on 
enriching the panel of GSTs used in Example 3, as described 
above. Four of the final panel members were glutathione-S- 
transf erase (GST) isoenzymes: human Al, rat R8, housefly HF2 
and schistosome SI. The remaining panel members were D-amino 
acid oxidase (DAO) from porcine kidney (EC1.4.3.3); butyryl 
cholinesterase (BCh) from horse serum (EC3.1.1.8); papain (Pap) 
(EC3.4.22.2) and snake venom phosphodiesterase I (PDE) from 
Crotalus adamantaeus (EC3 . 1.4.1). Cross-reactivity profiles 
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were obtained with respect to this panel of eight proteins for a 
representative sample of 122 diverse compounds listed, along 
with their identification codes in Figure 12. 

For convenience, in determining the fingerprints, the 
5 binding of each compound to each protein was quantified as the 
concentration needed to inhibit 50% of the protein's activity 
(IC 50 ) . The IC 50 values ranged over more than four log units from 
1 mM to less than 0,05 fiM. 

A subset of 12 of the 122 compounds initially tested 

10 was chosen based on a high selectivity of these compounds for 
one or another of the proteins in the reference panel. This 
initial training set of 12 compounds was assayed for inhibitory 
activity with respect to two target enzymes: glutathione 
reductase (GRd) and aldehyde dehydrogenase (AdDH) . These two 

15 proteins are not related to each other and are not related by 

amino acid homology or activity to any of the reference proteins 
in the panel . The 12 compounds selected for the training set 
are the first 12 compounds for which results are shown in Figure 
13 . A surrogate was obtained based on this training set by 

20 applying a linear regression to the data to obtain the 

coefficients in Equation (1) above. This resulted in the 
following regression equations for this iteration: 

For glutathione reductase: 0.11 BCh + 0.19 HF2 + 

1. 79; 

25 For aldehyde dehydrogenase: 0.55 PDE + 1.35. 

The resulting surrogate was used to choose (for each 
target) a second set of 10 compounds (from the remaining 110) 
that were expected to be more representative of the range of 
potencies for the targets. These 10 compounds (marked by 

30 vertical bars in Figure 13, and different in most instances for 
each target) were then tested directly against the target 
compounds and the data obtained from these tests were used to 
supplement the results from the first 12 compounds, providing a 
total training set of 2 2 compounds for each target. Linear 

3 5 regression applied to this newly defined training set yielded 
the following forms of Equation (1) for the two targets: 

For glutathione reductase: 0.21 BCh + 0.72 HF2 + 
0,24S1 - 0.05; 
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For aldehyde dehydrogenase: 0.58 PDE + 0.25 R8 + 

0.43. 

The predictions based on this second iteration for the 
remaining 100 compounds were then compared with the actual 
empirical values measured separately as shown in Figures 14a and 
14b. Each of these graphs represents a correlation plot of the 
-loglC^ for target obtained experimentally (on the X-axis) with 
the predicted -loglC^ (on the Y-axis) . 

The statistical parameters thus obtained showed that a 
reasonable correlation was obtained and that the correlation was 
improved in the second iteration. For glutathione reductase, 
the regression coefficient (R) , which measures the correlation 
between experiment and prediction, was 0.72 for the first 
iteration and 0.85 for the second. The dispersions (a), which 
measure scatter around the regression line for the training set 
or the prediction set, were 0.22 and 0.59 respectively for 
iteration 1 and 0.41 and 0.46 respectively for iteration 2. The 
F test value (F) measuring the improvement of fit as the ratio 
of dispersion for the current fit compared to the previous 
iteration, using random data for the initial comparison, was 4.7 
for iteration 1 and 15.9 for iteration 2. 

For aldehyde dehydrogenase, R was 0.4 for iteration 1 
and 0.86 for iteration 2, a considerable improvement. The 
sigmas for the training set and the prediction set were 0.51 and 
0.6 respectively for iteration 1 and 0.50 and 0.48 respectively 
for iteration 2. The F value was 6.9 for iteration 1 and 27.4 
for iteration 2. 

The mathematical techniques employed to generate the 
foregoing data are described in Green, J.R. et al . "Statistical 
Treatment of Experimental Data" (Elsevier, Amsterdam 1978) and 
Massart, D. et al . Chemometrics (Elsevier, New York 1988). 

Example 5 
Additional Target Correlations 
The techniques described in Example 4 were applied to 
various targets in addition to aldehyde dehydrogenase and 
glutathione reductase using a panel of 13 proteins which had 
been further enriched over the panel of Example 4 in the same 
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fashion that Example 4 used a panel enriched with respect to 
that of Example 3 . Surrogates were constructed against the 
additional targets: estrogen receptor, glycerol kinase, 
schistosome GST, nucleoside 5 ' -diphosphate kinase, human Factor 
5 Xa, trypsin and glyoxalase I. In each instance, a diverse set 
of 15-50 compounds, drawn from a database catalog of over 1,000 
compounds, was used for the fitting. For each determination, 
the panel included at least the following enzymes: GST Al-l; 
acid a-1 glycoprotein; GST Pl-1; human serum albumin; papain; 

10 GST Rat 12:12(0); GST Housefly 3; butyryl cholinesterase; GST 

Rat 8:8; trypsin; and alcohol dehydrogenase. Of course, trypsin 
was not included in the panel for which trypsin was the 
counterpart target. In some instances, plasmin was substituted 
for GST Rat 8:8 and/or antitrypsin was substituted for alcohol 

15 dehydrogenase. 

These surrogates were correlated with experimentally 
determined binding as shown in Figure 15. Correlations 
generally showed a good match between the surrogate and the 
actual target. In each case, a different linear combination of 

2 0 the reference proteins provided the best fit. In all cases, 
there is no sequence homology between target and fitting 
proteins. e 
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Claims 

1. A method to identify a candidate reactive with a 
target, which method comprises: 

(a) providing a formula that represents a combination 
of the reactivity profiles with respect to a first set of 
candidates of at least two members of a reference panel, which 
formula calculates a predicted profile that best matches the 
reactivity profile of the target with respect to said first set 
of candidates; 

(b) testing the reactivity of said at least two 
members of the panel with respect to a candidate; and 

(c) calculating a predicted reactivity with respect 
to the target for said candidate by applying said formula to the 
reactivities determined in step (b) to estimate the reactivity 
of the candidate with respect to the target . 

2. The method of claim 1 wherein the reactivities in 
step (b) are tested with respect to each member of a library of 
candidates , and/or 

wherein the combination in (a) is a linear 
combination. ° 

3. The method of claim 1 wherein application of step 
(c) results in compounds which are estimated to react well and 
compounds that are estimated to react poorly with a target, and 

at least some of the compounds which are estimated to 
react well and at least some of the compounds which are 
estimated to react poorly with the target are added to the first 
set of candidates, to generate a second set of candidates and 
step (a) is repeated with said second set of candidates to 
obtain an improved formula, and 

which further includes 

(d) testing the reactivity of said at least two 
members of said reference panel with respect to a candidate; and 

<e) calculating a predicted reactivity with respect 
to the target for said candidate by applying said formula to the 
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reactivities determined in step (d) to estimate the reactivity 
of the candidate with respect to the target . 

4 . A method to prepare a substance that reacts with 
a desired target which method comprises: 

5 (a) providing a formula that represents a combination 

of the reactivity profiles with respect to a first set of 
candidates of at least two members of a reference panel which 
formula calculates a predicted profile that best matches the 
reactivity profile of the target with respect to said first set 
10 of candidates; 

(b) testing the reactivity of said at least two 
members of the panel with respect to a candidate; 

(c) calculating a predicted reactivity with respect 
to the target for said candidate by applying said formula to the 

15 reactivities determined in step (b) to estimate the reactivity 
of the candidate with respect to the target; 

(d) identifying a substance as being a candidate 
predicted to react with the target, and 

(e) assembling the identified substance from starting 
20 materials appropriate to said substance. 

5 . A method to construct a reference panel for 
predicting reactivity of a candidate for a target which method 
comprises : 

arbitrarily identifying an initial set of panel 

25 members; 

obtaining profiles of reactivity for an initial set of 
arbitrarily chosen compounds with respect to said initial set of 
panel members; 

comparing the profiles obtained; 
30 discarding compounds and panel members which result in 

redundant profiles; 

substituting additional provisional panel members and 
compounds for the panel members and compounds discarded to 
obtain a second set of panel members and a second set of 
3 5 compounds ; 



BN80OCID: <WO_961«e8A1JL> 



WO 95/18969 



PCT/US95/00081 



- 42 - 

obtaining profiles for the second set of compounds 
with respect to said second set of panel members; 

again comparing the profiles obtained and discarding 
compounds and panel members that result in redundant profiles; 

» 

and 

repeating the foregoing steps until a panel which 
covers at least 90% of chemical space is obtained; and/or 

which provides at least 5 principal components with 
respect to the range of compounds marketed as small organic 
molecules ; and/or 

wherein for said panel, the average of the differences 
between a profile for any first compound from that of any second 
compound is at least three times the differences observed for 
repeated determinations of the profile of said first compound. 

6. A reference panel of members comprising proteins, 
wherein at least one of the members of the panel is a protein 
other than an immunoglobulin (Ig) or fragment thereof; and 

wherein said panel provides members binding in a 
multiplicity of differing degrees with respect to a population 
of compounds; and 

wherein the presence of said non-Ig protein enriches 

the panel. 

7 . The panel of claim 6 wherein said panel covers 
90% of chemical space; and/or 

wherein said panel provides at least 5 principal 
components with respect to the range of compounds marketed as 
small organic molecules; and/or 

wherein for said panel, the average of the differences 
between a profile for any first compound from that of any second 
compound is at least three times the differences observed for 
repeated determinations of the profile of said first compound. 

8 . The panel of claim 6 wherein said panel comprises 
at least 2 enzymes or at least 2 lectins or at least 2 T cell 
receptors or at least 2 olfactory receptors, and/or wherein said 
panel comprises at least 10% non-Ig proteins. 
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9. A method to characterize a single analyte, which 
method comprises : 

contacting said analyte with each member of a 
panel of claim 8, wherein said compound is the analyte; 

detecting the degree of reactivity of said 
analyte to each of said members; 

recording said degree of reactivity of said 
analyte to each of said members; and 

arranging said recorded degrees of reactivity so 
as to provide a characteristic profile of said analyte. 



10. A method to identify a candidate which candidate 
will be effective in reacting with a target, wherein said target 
has a known ligand with which it reacts, which method comprises: 

contacting said candidate with each member of the 
15 panel of claim 8 wherein said compound is the candidate; 

detecting the degree of reactivity of said 
candidate to each of said members; 

recording each said degree of reactivity of said 
candidate to each of said members; 

arranging said recorded degrees of reactivity so 
as to provide a characteristic profile of said candidate; 

comparing said profile to a profile analogously 
obtained of said ligand with respect to said multiplicity of 
members ; 

25 wherein similarity of the profile of said 

candidate to the profile of said ligand indicates the 
probability that the candidate will react with said target. 

11. The method of claim 10 wherein said comparing 
includes the steps of: 

30 determining a point obtained by plotting, in 

n-dimensional space, the profile of reactivity of the candidate 
for each member of the panel, wherein each n dimension 
represents a different member of the panel and the reactivity of 
the candidate with said each member is plotted in said each n 

3 5 dimension; and 
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comparing the position of said point to the point in 
said n-dimensional space determined for the profile representing 
the reactivity of the known ligand for each member of the panel 
wherein proximity of the points indicates the degree of binding 
5 of the candidate to the target. 

12 . A method to prepare a substance that reacts with 
a desired target wherein the target has a known ligand with 
which it reacts which method comprises: 

contacting a candidate with each member of the panel 
10 of claim 8 wherein said compound is said candidate; 

detecting the degree of reactivity of said candidate 
to each of said members; 

recording each said degree of reactivity of said 
candidate to each of said members ; 
15 arranging said recorded degrees of reactivity so as to 

provide a characteristic profile of said candidate; 

comparing said profile to a profile analogously 
obtained of said ligand with respect to said multiplicity of 
members; 

20 wherein similarity of the profile of said candidate to 

the profile of said ligand indicates the probability that the 
candidate will react with said target; 

identifying a substance as a candidate having a high 
probability of reacting with target that has a profile similar 
25 to that of the ligand, and 

assembling the identified substance from the starting 
materials appropriate to said substance. 

13. A method to select from a multiplicity of 
candidates a candidate that reacts specifically with a known 

3 0 target, which method comprises: 

providing a profile of reactivity of said target 
against a maximally diverse set; 

preparing a profile of the reactivity of the 
candidate with respect to the panel of claim 8 which panel is 
35 the inverse image of the maximally diverse set; 
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comparing the maximally diverse set profile of 
the target with the inverse image panel profile of the 
candidate; and 

wherein similarity of the inverse image panel 
profile with maximally diverse set profile indicates the 
probability that the candidate will bind to the target. 

14 . The method of claim 13 wherein said comparing 
includes the steps of: 

determining a point obtained by plotting, in 
n-dimensional space, the profile of reactivity of the candidate 
for each member of the inverse image panel, wherein each n 
dimension represents a different member of the inverse image 
panel and the reactivity of the candidate with said each member 
is plotted in said each n dimension; and 

comparing the position of said point to the point in 
said n-dimensional space determined for the profile representing 
the reactivity of the target for each member of the maximally 
diverse set wherein proximity of the points indicates the degree 
of binding of the candidate to the target . 

15 . A method to prepare a substance that reacts with 
a desired target which method comprises : 

providing a profile of reactivity of said target 
against a maximally diverse set; 

preparing a profile of the reactivity of the candidate 
tc the panel of claim 8 which is the inverse image of the 
maximally diverse set; 

comparing the maximally diverse set profile of the 
target with the inverse image panel profile of the candidate; 
and 

wherein similarity of the inverse image panel profile 
with the maximally diverse set profile indicates the probability 
that the candidate will bind to the target; 

identifying a substance as being a. candidate having a 
high probability of reacting with target that has a profile 
similar to that of the receptor, and 
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assembling the identified substance from the starting 
materials appropriate to said substance. 

16. A method to select from a multiplicity of 
candidates a candidate that reacts specifically with a known 
5 target, which method comprises: 

providing a profile of reactivity of said 
candidate against a maximally diverse set; 

preparing a profile of the reactivity of the 
target with respect to the panel of claim 8, which panel is the 
10 inverse image of said maximally diverse set; 

comparing the maximally diverse panel profile of 
the candidate with the inverse image panel profile of the 
target; and 

wherein similarity of the inverse image panel 
15 profile with maximally diverse set profile indicates the 
probability that the candidate will bind to the target. 
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17. The method of claim 16 wherein said comparing 
includes the steps of: 

determining a point obtained by plotting, in 
n-dimensional space, the profile of reactivity of the candidate 
for each member of the maximally diverse set, wherein each n 
dimension represents a different member of the panel and the 
reactivity of the candidate with said each member is plotted in 
said each n dimension; and 
25 comparing the position of said point to the point in 

said n-dimensional space determined for the profile representing 
the reactivity of the target for each member of the inverse 
image panel wherein proximity of the points indicates the degree 
of binding of the candidate to the target. 
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18. A method to prepare a substance that reacts with 
a desired target which method comprises: 

providing a profile of reactivity of a candidate 
against a maximally diverse set; 
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preparing a profile of the reactivity of the target t 
panel of claim 8 which is the inverse image of the maximally 
diverse set; 

comparing the maximally diverse set profile of the 
candidate with the inverse image panel profile of the target; 
and 

wherein similarity of the inverse image panel profile 
with maximally diverse set profile indicates the probability 
that the candidate will bind to the target; 

identifying a substance as being a candidate having a 
high probability of reacting with target that has a profile 
similar to that of the target, and 

assembling the identified substance from the starting 
materials appropriate to said substance. 

19. A fingerprint for a compound which is a 
characteristic profile obtained by the method of claim 9 . 

20. A database comprising multiple fingerprints of 

claim 19. 

21. A subset of the database of claim 20 selected to 
provide maximal diversity. 

22. The subset of claim 21 for use as a core 
screening set, or for use as a source of chromatographic 
ligands . 

23 . A database which comprises the physical 
embodiment of fingerprints for more than 1000 compounds against 
a reference panel, said compounds representing a range of 
binding with the panel members of at least three log units, and 
wherein at least 95% of the compounds generate visible 
fingerprints . 

24. A subset of the database of claim 23 selected to 
provide maximal diversity. 
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25. The subset of claim 24 for use as a core 
screening set, or for use as a source of chromatographic 
ligands. 
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Code Compound Name 

Amines 

Al Fendiline 

A2 4,4 , -Diaminodiphenyl sulfone 

A3 Dipyridamole 

A4 Glafenine 

Cephalosporins 

CI Cephaioglycin 

C2 Cephapirin 

C3 Cephalothin 

C4 Cephradine 

Dyes 

D 1 Cibacrort brilliant red 3B A 
D2 Cibacron brilliant yellow 3GP 
D3 9-(Octadecylamino) acridine 
D4 Acridine orange hydrochloride 
hydrate 

D5 Phenyl 9-acridinecarboxylate 
Aliphatics 

Fl 3-Hydroxy-l-methylpiperidine 

F2 Fertilysin 

Conjugated Aromatics 

J I Xanthurenic acid 

J2 2,2'-( 1 f 3-indenediformyl)dibenzoic 

acid 
J3 Citrinin 
J4 N-(2-amino-4- 

ch!orophenyI)anthranilic acid 
J5 a-Cyano-3-hydroxycinnamic acid 
J6 Nalidixic Acid 
J7 Lasalocid 
J8 Quinaldic acid 
Ketones 

Kl S^'-Dibromosalicil 

K2 4,5 Diphenyl- 1 ,3-dioxolan-2-one 



Code Compound Name 

Phenols 

LI Nordihydroguaiaretic acid 

L2 Dienestrol 

L3 Catechin 

L4 Naringenin 

Amides 

M 1 1 f 3-Di-p-tolyl-2-thiourea 
M2 Indomethacin 
M3 Colchicine 
M4 Nimesulide 
Nitro-aromatics 

N 1 5-(4-nitrophenyI)-2-furoic acid 
N2 N-(4-Dimethylamino-3 f 5- 

dinitrophenyl)*maieimide 
N3 4 t 5-Dichloro-2-nitroaniline 
N4 2-(2 f 4-Dinitrostyryl) thiophene 
N5 I -(5-Nitrofurfurylidine-2-imino)-2 t 4- 

dimethyl-5-cyano-6-pyridone 
N6 Nitrofurantoin 
N7 Furazolidone 
N8 5-Nitro-2-furanacrolein 
N9 5-Nitro-2-furaldehyde diacetate 
N 1 0 5-Nitro-2-furaldehyde 

semicarbazone 
Nil 1 ,5-Bis-(5-nitro-2-furylH ,4- 

pentadien-3-one 
N 1 2 /crr-Butyl 5-nitro-2-lhiophene 

carboxylate 
N13 4-Nitro-N-(2- 

thienylmethylene)aniline 
N14 N-(5-Nitro-2.pyridyl)-3,4 T 5 t 6- 

tetrachlorophthalamic acid 
N 1 5 N-(5-Nitro-3-pyridyl)phthalamic 

acid 
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Code Compound Name 

Peptides 

P 1 y-Glu-S-hexy 1 cy s-glu 

P2 y-Glu-S-hexyl cys-phenyl gly 

P3 Y-Glu-S-hexyl cys-(J-ata 

P4 Y-G ,u -S-octyl cys-gly 

P5 y-Glu-S-butyl cys-gly 

P6 Y-G' u -S-(P-methyl naphthyl)cys-gly 

Quinones 

Q 1 1 ,2,3,4-Tetrafluoro-5,8-dihydroxy- 

anthraquinone 
Q2 6-(DietJiylaminomethyl)kojic acid 
Steroids 

5 1 5a-Androstane-3b, 1 7p-diol hydrate 

52 Cholic acid 

53 Lithocholic acid 

54 Deoxycholic acid 

55 Chenodeoxycholic acid 

56 Corticosterone 

57 Cymarin 
Triazines 

Tl 2-Decanoy 1-4,6-diamino- 1 ,3,5- 

triazine 
T2 Simazine 

T3 4,6-Dihyroxy- 1 ,3,5-triazine-2-acetic 

acid O-anisidide 
Uncongugated aromatics 
Ul Ibuprofen 
U2 Indoprofen 
U3 Fenoprofen 
U4 (S)-6-methoxy-a-methyl-2- 

naphthaleneacetic acid, sodium salt 
U5 Bis(4-chlorophenoxy)acetic acid 
U6 Fenbufen 

U7 2-(4-benzyIoxyphenoxy)-2- 

methylpropionic acid 
U8 2-(4-terf-butylphenoxy)acetic acid 



Code Compound Name 

U9 Y-O^o-2-naphthalenebutyric acid 
U 1 0 2-(4-aminophenoxy)acetic acid 

hydrochloride 
U 1 1 2-(4-cinnamoylphenoxy)acetic acid 
U 1 2 2-(4-formylphenoxy)acetic acid 
Xanthenes 

XI ErythrosinB 
X2 phloxine B 

X3 Fluoresceinamine, isomer II 

X4 Pyrogallol red 

X5 Fluorescein isothiocyanate, isomer I 

X6 9-phenyI-2 t 3,7-trihydroxy-6-fluorone 

X7 4-(6-hydroxy-3-oxo-3H-xanthen-9- 
yI)benzoic acid 

X8 9-(4-(Dimethylamino)Phenyl)-2,6,7- 
Trihydoxy-3H-Xanthen-3-One 
Sulfate 

X9 6-hydroxy-3-oxo-3H-xanthene-9- 
propionic acid 

X10 9-(2,4-DichIorphenoxymethyI)-6- 
hydroxy-3H-xanthen-3-Gne 

X I I Dimethyl 4-(6-hydroxy-3-oxo-3H- 
xanthen-9yl) isophthalate 

Miscellaneous 

Zl Pyrocatechol violet 

7^1 Ajmaline 

Z3 6-Chloro-3-nitro-2H-chromene 

Z4 Cholecalciferol 

Z5 1,1 -Dibenzoylferrocene 

26 2,5-DiphenyloxazoIe 

Z7 Ethaverine 

Z8 Econazole 

Z9 Harmaline 

Z10 Quinine 

Zl 1 Scopoletin 
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Code Compound Name 

Amines 

Al Bucacaine 

A2 4 f 4 , -Diaminodiphenyl sulfone 

A3 p- 

Di me thy laminobenzy laldehyde 
M 4,4* - 

bis(Dimethylamino)Benzhydrol 
AS Dipyridamole 
A6 Fendiline 
A7 Glafcnine 



Cephalosporins 



CI 

a 
a 

C4 
C5 
C6 
C7 
Dyes 
Dl 
D2 
D3 



Cephaloglycin 

Cephapirin 

Cephalothin 

Cephradine 

Cephaloridine 

Cefoperazone 

Cefaclor 



Cibacron brilliant red 3BA 
Cibacron brilliant yellow 3GP 
Acridine orange hydrochloride 
hydrate 

D* Phenyl 9-acridinecarboxylate 
DS Pyrocatechol violet 
Alipttatics 

Fl 3-Hydroxy-l-methylpiperidine 
F2 Fertilysin 
Aromatic Heterocyclics 
HI 6,7-dimethy l-s,3-di-(2-pyridy I) - 

quinoxaline 
H2 Harmaline 
H3 Quinine 
H4 8-ChlorotheophylIine 
KS Murexide 



Conjugated Aromatics 



Jl 

J2 
J3 

J4 
J5 
J6 



12XU- 

indenediformyl)dibenzoic acid 

Citrinin 

N-(2-amino-4- 

chlorophenyl)anthranilic acid 
Lasalocid 
Quinaldic acid 
Xanthurenic acid 



Code Compound Name 

J7 a-Cyano-3-hydroxycinnaraic 
acid 

J? Flumequine 
J9 Nalidixic Acid 
J 10 Norfloxacin 
Ketones 

Kl 5 J-Dibromo$alicil 

K2 4,5 Diphenyl-i f 3-dioxoIan-2-one 
K3 Chalcone 
K4 Ketotifen 

K5 2-hydroxy-3-(naphthylM,4- 

napthoquinone 
K6 1 .2.3.4-Tetrafluoro-5 f 8 - 

dihydroxy-anthraquinone 
K7 Scopoletin 
Phenols 

LI Nordihydroguaiaretic acid 
L2 Dienestrol 
L3 Cacechin 
L4 Naringcnin 

L5 Hesperetin 
Amides 

Ml <x-(4-chlorophenyI)-a-(dihydro- 

oxo-pyridyImethyI)-im? 
M2 1 ,3-Dt-p-toly I-2-thiourea 
M3 dansylamide 
M4 Nimesulide 
MS Chloramphenicol Base 
M6 Colchicine 
M7 Oxolamine 
Nitro-aromarics 

Nl 5-(4-nitrophenyl)-2-fiiroic acid 
N2 N-(4-Dimethy larnino-3,5 - 

dinitrophenyl)-maleimide 
NO 4*S~Dichlon)~2-nitroaniline 
N4 2-(2,4-Dinitrostyryl) thiophene 
NS tert-Butyl 5-nitro-2*thiophene 

carboxylate 
N5 4-Nitn>N-(2- 

thienylmethylene)aniline 
N7 N-(5-Nitro-2-pyridyl)-3,4.5,6- 

tctrachlorophthalamic acid 
N8 N-(5-Nitro-3-pyridyl)phthalamic 

acid 
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Code Compound Name 
Peptides 

PI y-Gtu-S-hexyl cys-giu 

P2 Y-G'u-S-hexyl cys-phenyl gly 
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P6 Y-Glu-S-O-methyl naphihyl)cys 

giy 

P7 Met-Leu-Phe 
Steroids 

5 1 5 a- Androstane-3 p, 17 p-diol 
hydrate 

52 Cholic acid 

53 Lithocholic acid 

54 Deoxy cholic acid 

55 Chenodeoxycholic acid 

56 Conicoscerone 

57 Cymarin 

58 p-Escin 
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13 



U2 
IB 



2-DecanoyI-4 f 6-diamino- 1,3»5 - 
triazine 
Simazine 

4,6-Dihyroxy- 1.3,5-triazine-2- 
acetic acid O-anisidide 
Unconjugated aromatic acids 
Ul B is(4-chlorophenoxy)acetic acid 

2-(4-biphenyIoxy)propionic acid 
2-<4- 

(fluorosuIfonyl)phenoxy)acetic 
acid 

2-(4-benzy Ioxyphenoxy)-2 - 
methylpropionic acid 
2-(4-tert-butylphenoxy)acetic 
acid 

U6 Y-Oxo-2-naphthaIenebutyric acid 
U7 2-(4-aminophenoxy)acetic acid 

hydrochloride 
U8 2-(4-cinnarnoylphenoxy)acetic 

acid 

IB 2-(4-formyiphenoxy)aceric acid 
UIO Ibuprofen 
Ul I Indomethacin 
U12 Indoprofen 



IM 



U5 



Code Compound Name 

U13 Fenoprofen 

U 1 4 (S )-6- methoxy-a- methy 1-2- 

naphthaleneacetic acid, sodium 

salt 

U15 gemfibrozil 
U 1 6 podocarpic acid 
U17 Fenbufen 
Xanthenes 
XI Erythrosin B 
X2 phloxine B 
X3 Fluoresceinamine, isomer II 
X4 Pyrogallol red 
XS Ruorescein isothiocyanate. 
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X7 



XI2 



X9 



X10 



XII 



9-phenyI-2J/7-trihydroxy-6- 
fluorone 

4-(6-hydroxy-3-oxo-3H - 
xanthen-9-yl)benzoic acid 
2-(6-hydroxy-3-oxo-3H - 
xanthen-9-yI)- 
cyciohexanecarboxylic acid 
6-hydroxy-3-oxo-3H-xamhene - 
9-propionic acid 
9-(2.4-DichlorphenoxymethyI)- 
6-hydroxy-3H-xanthen-3-One 
Dimethyl 4-(6-hydroxy-3-oxo- 
3H-xanthen-9yI) isophthalate 
9-(4-(Dimethylamino)Phenyl)- 
2.6.7-Trihydoxy-3H-Xanthen-3 - 
One Sulfate 
Miscellaneous 

Zi I -thio-|3-D-glucose tetraacetate 

22 Econazole 

23 Taxol 
254 Ajmaline 

25 6-Chloro-3-nitro-2H-chromene 

26 Cholecalciferol 

Z7 1 . 1 '-Dtbenzoylferrocene 
28 2.5-Dtphenyloxazole 
79 Ethaverine 

Z10 Iodonitrotetrazolium Formazan 
Zl 1 l-(Mesitylene-2-Sulfonyl) 

Imidazole 
Z12 Olivctol 
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